kermit.columbia.edu

home *** CD-ROM | disk | FTP | other *** search

/ kermit.columbia.edu / kermit.columbia.edu.tar / kermit.columbia.edu / newsgroups / misc.19980424-19980901 / 000416_news@newsmaster….columbia.edu _Wed Aug 26 10:19:32 1998.msg < prev next >

Wrap

Internet Message Format | 1998-08-31 | 6KB

Return-Path: <news@newsmaster.cc.columbia.edu> Received: from newsmaster.cc.columbia.edu (newsmaster.cc.columbia.edu [128.59.35.30]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA15001 for <kermit.misc@watsun.cc.columbia.edu>; Wed, 26 Aug 1998 10:19:31 -0400 (EDT) Received: (from news@localhost) by newsmaster.cc.columbia.edu (8.8.5/8.8.5) id KAA03002 for kermit.misc@watsun; Wed, 26 Aug 1998 10:19:31 -0400 (EDT) Path: news.columbia.edu!panix!howland.erols.net!news.idt.net!psinntp!pubxfer.news.psi.net!usenet From: "Bob Kennedy" <bkennedy@peco-energy.com> Newsgroups: comp.protocols.kermit.misc Subject: How I Implemented an application using C-Kermit fr VMS Date: Wed, 26 Aug 1998 10:13:20 -0400 Organization: PSINet Lines: 89 Message-ID: <6s156a$s0k$1@client3.news.psi.net> NNTP-Posting-Host: 159.214.60.222 X-Newsreader: Microsoft Outlook Express 4.72.2106.4 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.2106.4 Xref: news.columbia.edu comp.protocols.kermit.misc:9142 Plant Monitoring System System Health Monitor Written By: Robert S. Kennedy Date: August 14, 1998 I work for a utility company, and one of my many functions here is to maintain a Plant Monitoring System (PMS). This PMS is a complex, redundant system. It uses a high-speed, intelligent, distributed Data Acquisition System (DAS) to collect and produce plant data, and transmit this data to host processors. The data is manipulated and validated by application programs, and then presented to operations personnel and other pertinent users on video terminals, plotters, and printers. The key to configuring, controlling, and maintaining this complex configuration is knowledge of the hardware, peripheral components, applications, and the software design and implementation. The PMS runs on two VAX 4000-600 host computers per unit, on ethernet nodes. There are 24 additional 4000-90a micro-vax workstations. We use VMS ver. 6.1 on our 4 main hosts. Because of the major complexity of this system, it can fail in many ways; Disk Drive space is usually at a premium; printer queues consistently fail, or stop; or the DAS shuts-down. Any one of these failures is highly visible to the users of this system, therefore it is important to resolve these failures in a timely manner. In the past, we had a pager for which the operations department could call in the event of a computer emergency. We would respond as quickly as we could to resolve the problem, but many times the effort was too late. The need to respond more quickly or to look for emerging issues was becoming more and more at hand. After a brainstorming session with other individuals from my group, we came up with a solution. This solution was that a DCL Command procedure could run once an hour and check various items in the system, like disk space, stopped queues, or if the DAS were down. If there existed any problem on any hour, any one of the four host nodes could call us on our digital pager with some form of numeric code describing the problem. That solution worked, and worked well, until we started adding more functions to this "System Health Checker". The more we added, the more codes were needed, and pretty soon, we needed to carry around a "cheat sheet" with a description of all the codes. What we needed was one of those new "fancy dancy" alphanumeric pagers, so the system could call us and tell us in "English", not numbers, what has gone wrong. While 'surfing' the Internet one day, when I happened upon a website, 'www.columbia.edu' and found a 'Kermit' page there. I was familiar with the use of "Kermit" through our VAX/VMS systems, and started reading about the new release of "C-Kermit". Being a programmer who knows how to read and program in "C", I became more curious. I downloaded the newest version of "C-Kermit" for our VAX/VMS system and installed it, and found that it ran perfectly and without a hitch. As I went on to read the miscellaneous documents, I found that this version of "C-Kermit" was not freeware, but shareware and that I should purchase the book as payment for the program. After receiving the book, I breezed through it and found a section on "Calling an Alphanumeric Pager". This was just what the "doctor ordered". After talking to the people whom we use for our pagers, I was able to receive all the necessary documents on how to use their Telocator Alphanumeric Protocol (TAP). Using, as a guide, the example macro in the C-Kermit book on "sending a one-line alpha page using TAP", I was able to send the alpha messages to our newly acquired on-call alpha pager. Loaded with this new technology, I was able to apply this to our "Health Checker". Our "Health Checker" now performs approximately 17 separate functions on an hourly basis per host node. The system monitors Disk Space; Queues; the DAS; other systems attached to our PMS through a serial link; miscellaneous applications; as well as monitoring itself. One of the four nodes is responsible for checking the other three to see if the "Health Checker" is running in the "SYS$BATCH" queue or not. Each of these separate sub-modules has the ability to call the alpha pager with a message describing the problem. This is done by a system-wide logical pointing to a DCL command procedure, that basically runs "C-Kermit" with script code using the TAP. Once an hour all four nodes check their respective functions, and twice a day, the system will page us with a message that each of these four nodes is "AOK". C-Kermit opened up the door for us to perform this proactive search for problems and a means by which to contact us in the event of an emergency with a well-defined "English" message. This is most effective around 03:30 in the morning when the system calls to tell us that DAS is down, or any other type of problem. If we were still using that "Digital pager", we would need to find, then lookup the code describing the problem. This, for most of us, is difficult at three in the morning.